Add asynchronous load method #10327

TomNicholas · 2025-05-16T16:05:49Z

Adds an .async_load() method to Variable, which works by plumbing async get_duck_array all the way down until it finally gets to the async methods zarr v3 exposes.

Needs a lot of refactoring before it could be merged, but it works.

Closes Add an asynchronous load method? #10326
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

API:

for more information, see https://pre-commit.ci

doc/whats-new.rst

xarray/core/dataset.py

dcherian · 2025-08-13T01:33:47Z

xarray/core/dataset.py

+    async def load_async(self, **kwargs) -> Self:
+        # TODO refactor this to pull out the common chunked_data codepath
+
+        # this blocks on chunked arrays but not on lazily indexed arrays


FYI dask has async compute but it seems hard to work in here :)

https://distributed.dask.org/en/stable/asynchronous.html

xarray/core/variable.py

dcherian · 2025-08-13T01:38:36Z

xarray/tests/test_backends.py

+        not has_zarr_v3, reason="zarr-python <3 did not support async loading"
+    )
+    @pytest.mark.asyncio
+    async def test_load_async(self) -> None:


If this parallels test_load then lets keep this on the base class and use pytest.skip() on the netCDF subclasses. That way it's easy to keep the two in sync. Let's add a comment requesting future contributors to keep the two in sync

Yeah I did think about doing this but skipping an inherited method seemed messy.

I made it work in cf1d127, though. Note that the MRO becomes important to get the correctly overridden test to be inherited (so it can be skipped). I think the cleaner solution here would be if we could simply ask the backends whether or not they support async indexing, which is an idea we also discussed for #10579 (comment).

xarray/tests/test_backends.py

dcherian · 2025-08-13T03:14:13Z

xarray/tests/test_variable.py

+    @pytest.mark.asyncio
+    async def test_lazy_async_indexing(self) -> None:
+        v = Variable(dims=("x", "y"), data=LazilyIndexedArray(self.d))
+        await self.check_orthogonal_async_indexing(v)


This is fine, but we could combine the sync and async checks in one async function and just use that everywhere in this file.

Done in 4f40792, though now I'm a little worried that this pattern

async def check_orthogonal_indexing(self, v): expected = self.d[[8, 3]][:, [2, 1]] result = v.isel(x=[8, 3], y=[2, 1]) assert np.allclose(result, expected) result = await v.isel(x=[8, 3], y=[2, 1]).load_async() assert np.allclose(result, expected)

might be automatically passing the second assertion by still being in-memory after the first assert?

You could parametrize the test over async and non async calls

@pytest.mark.parametrize("use_async", [True, False]) async def check_orthogonal_indexing(self, v, use_async): expected = self.d[[8, 3]][:, [2, 1]] if use_async: result = await v.isel(x=[8, 3], y=[2, 1]).load_async() else: result = v.isel(x=[8, 3], y=[2, 1]) assert np.allclose(result, expected)

Could use assert not v._in_memory to be really sure.

I went with the parametrization idea, though the syntax has to be a little messier because you can't parametrize normal functions in pytest. a074a25

xarray/tests/test_backends.py

dcherian · 2025-08-13T03:23:26Z

xarray/tests/test_backends.py

+    @pytest.mark.parametrize("cls_name", ["Variable", "DataArray", "Dataset"])
+    @pytest.mark.parametrize(
+        "indexer, method, target_zarr_class",
+        [


👏🏾 👏🏾 👏🏾

xarray/tests/test_backends.py

dcherian

Amazing work!

I left some minor comments that should be easy to address.

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

…function

…ata using zarr-python v2

TomNicholas and others added 21 commits October 24, 2024 17:48

new blank whatsnew

01e7518

Merge branch 'main' of https://github.com/pydata/xarray

83e553b

Merge branch 'main' of https://github.com/pydata/xarray

e44326d

Merge branch 'main' of https://github.com/pydata/xarray

4e4eeb0

Merge branch 'main' of https://github.com/pydata/xarray

d858059

Merge branch 'main' of https://github.com/pydata/xarray

d377780

Merge branch 'main' of https://github.com/pydata/xarray

3132f6a

Merge branch 'main' of https://github.com/pydata/xarray

900eef5

Merge branch 'main' of https://github.com/pydata/xarray

4c4462f

Merge branch 'main' of https://github.com/pydata/xarray

5b9b749

Merge branch 'main' of https://github.com/pydata/xarray

fadb953

Merge branch 'main' of https://github.com/TomNicholas/xarray

57d9d23

Merge branch 'main' of https://github.com/pydata/xarray

11170fc

Merge branch 'main' of https://github.com/pydata/xarray

0b8fa41

Merge branch 'main' of https://github.com/pydata/xarray

f769f85

Merge branch 'main' of https://github.com/pydata/xarray

4eef318

Merge branch 'main' of https://github.com/pydata/xarray

29242a4

test async load using special zarr LatencyStore

e6b3b3b

don't use dask

3ceab60

async all the way down

071c35a

remove assert False

29374f9

TomNicholas added the enhancement label May 16, 2025

github-actions bot added topic-backends topic-indexing topic-documentation topic-zarr Related to zarr storage library io topic-NamedArray Lightweight version of Variable labels May 16, 2025

pre-commit-ci bot and others added 2 commits May 16, 2025 16:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

ab12bb8

for more information, see https://pre-commit.ci

add pytest-asyncio to CI envs

62aa39d